202 research outputs found

    Detail-Preserving Pooling in Deep Networks

    Full text link
    Most convolutional neural networks use some method for gradually downscaling the size of the hidden layers. This is commonly referred to as pooling, and is applied to reduce the number of parameters, improve invariance to certain distortions, and increase the receptive field size. Since pooling by nature is a lossy process, it is crucial that each such layer maintains the portion of the activations that is most important for the network's discriminability. Yet, simple maximization or averaging over blocks, max or average pooling, or plain downsampling in the form of strided convolutions are the standard. In this paper, we aim to leverage recent results on image downscaling for the purposes of deep learning. Inspired by the human visual system, which focuses on local spatial changes, we propose detail-preserving pooling (DPP), an adaptive pooling method that magnifies spatial changes and preserves important structural detail. Importantly, its parameters can be learned jointly with the rest of the network. We analyze some of its theoretical properties and show its empirical benefits on several datasets and networks, where DPP consistently outperforms previous pooling approaches.Comment: To appear at CVPR 201

    Virtual Rephotography: Novel View Prediction Error for 3D Reconstruction

    Full text link
    The ultimate goal of many image-based modeling systems is to render photo-realistic novel views of a scene without visible artifacts. Existing evaluation metrics and benchmarks focus mainly on the geometric accuracy of the reconstructed model, which is, however, a poor predictor of visual accuracy. Furthermore, using only geometric accuracy by itself does not allow evaluating systems that either lack a geometric scene representation or utilize coarse proxy geometry. Examples include light field or image-based rendering systems. We propose a unified evaluation approach based on novel view prediction error that is able to analyze the visual quality of any method that can render novel views from input images. One of the key advantages of this approach is that it does not require ground truth geometry. This dramatically simplifies the creation of test datasets and benchmarks. It also allows us to evaluate the quality of an unknown scene during the acquisition and reconstruction process, which is useful for acquisition planning. We evaluate our approach on a range of methods including standard geometry-plus-texture pipelines as well as image-based rendering techniques, compare it to existing geometry-based benchmarks, and demonstrate its utility for a range of use cases.Comment: 10 pages, 12 figures, paper was submitted to ACM Transactions on Graphics for revie

    LR-CNN: Local-aware Region CNN for Vehicle Detection in Aerial Imagery

    Get PDF
    State-of-the-art object detection approaches such as Fast/Faster R-CNN, SSD, or YOLO have difficulties detecting dense, small targets with arbitrary orientation in large aerial images. The main reason is that using interpolation to align RoI features can result in a lack of accuracy or even loss of location information. We present the Local-aware Region Convolutional Neural Network (LR-CNN), a novel two-stage approach for vehicle detection in aerial imagery. We enhance translation invariance to detect dense vehicles and address the boundary quantization issue amongst dense vehicles by aggregating the high-precision RoIs' features. Moreover, we resample high-level semantic pooled features, making them regain location information from the features of a shallower convolutional block. This strengthens the local feature invariance for the resampled features and enables detecting vehicles in an arbitrary orientation. The local feature invariance enhances the learning ability of the focal loss function, and the focal loss further helps to focus on the hard examples. Taken together, our method better addresses the challenges of aerial imagery. We evaluate our approach on several challenging datasets (VEDAI, DOTA), demonstrating a significant improvement over state-of-the-art methods. We demonstrate the good generalization ability of our approach on the DLR 3K dataset.Comment: 8 page

    New acquisition techniques for real objects and light sources in computer graphics

    Get PDF
    Accurate representations of objects and light sources in a scene model are a crucial prerequisite for realistic image synthesis using computer graphics techniques. This thesis presents techniques for the effcient acquisition of real world objects and real world light sources, as well as an assessment of the quality of the acquired models. Making use of color management techniques, we setup an appearance reproduction pipeline that ensures best-possible reproduction of local light reflection with the available input and output devices. We introduce a hierarchical model for the subsurface light transport in translucent objects, derive an acquisition methodology, and acquire models of several translucent objects that can be rendered interactively. Since geometry models of real world objects are often acquired using 3D range scanners, we also present a method based on the concept of modulation transfer functions to evaluate their accuracy. In order to illuminate a scene with realistic light sources, we propose a method to acquire a model of the near-field emission pattern of a light source with optical prefiltering. We apply this method to several light sources with different emission characteristics and demonstrate the integration of the acquired models into both, global illumination as well as hardware-accelerated rendering systems.Exakte Repräsentationen der Objekte und Lichtquellen in einem Modell einer Szene sind eine unerlässliche Voraussetzung für die realistische Bilderzeugung mit Techniken der Computergraphik. Diese Dissertation beschäftigt sich mit der effizienten Digitalisierung von realen Objekten und realen Lichtquellen. Dabei werden sowohl neue Digitalisierungstechniken als auch Methoden zur Bestimmung der Qualität der erzeugten Modelle vorgestellt. Wir schlagen eine Verarbeitungskette zur Digitalisierung und Wiedergabe der Farbe und Spekularität von Objekten vor, die durch Ausnutzung von Farbmanagementtechniken eine bestmögliche Wiedergabe des Objekts unter Verwendung der gegebenen Ein- und Ausgabegeräte ermöglicht. Wir führen weiterhin ein hierarchisches Modell für den Lichttransport im Inneren von Objekten aus durchscheinenden Materialien sowie eine zugehörige Akquisitionsmethode ein und digitalisieren mehrere reale Objekte. Die dabei erzeugten Modelle können in Echtzeit angezeigt werden. Die Geometrie realer Objekte spielt eine entscheidende Rolle in vielen Anwendungen und wird oftmals unter Verwendung von 3D Scannern digitalisiert. Wir entwickeln daher eine Methode zur Bestimmung der Genauigkeit eines 3D Scanners, die auf dem Konzept der Modulationstransferfunktion basiert. Um eine Szene mit realen Lichtquellen beleuchten zu können, schlagen wir ferner eine Methode zur Erfassung der Nahfeldabstrahlung eine Lichtquelle vor, bei der vor der Digitalisierung ein optischer Filterungsschritt durchgeführt wird. Wir wenden diese Methode zur Digitalisierung mehrerer Lichtquellen mit unterschiedlichen Abstrahlcharakteristika an und zeigen auf, wie die dabei erzeugten Modelle in globalen Beleuchtungsberechnungen sowie bei der Bildsynthese mittels moderner Graphikkarten verwendet werden können

    Background Subtraction with Real-time Semantic Segmentation

    Full text link
    Accurate and fast foreground object extraction is very important for object tracking and recognition in video surveillance. Although many background subtraction (BGS) methods have been proposed in the recent past, it is still regarded as a tough problem due to the variety of challenging situations that occur in real-world scenarios. In this paper, we explore this problem from a new perspective and propose a novel background subtraction framework with real-time semantic segmentation (RTSS). Our proposed framework consists of two components, a traditional BGS segmenter B\mathcal{B} and a real-time semantic segmenter S\mathcal{S}. The BGS segmenter B\mathcal{B} aims to construct background models and segments foreground objects. The real-time semantic segmenter S\mathcal{S} is used to refine the foreground segmentation outputs as feedbacks for improving the model updating accuracy. B\mathcal{B} and S\mathcal{S} work in parallel on two threads. For each input frame ItI_t, the BGS segmenter B\mathcal{B} computes a preliminary foreground/background (FG/BG) mask BtB_t. At the same time, the real-time semantic segmenter S\mathcal{S} extracts the object-level semantics St{S}_t. Then, some specific rules are applied on Bt{B}_t and St{S}_t to generate the final detection Dt{D}_t. Finally, the refined FG/BG mask Dt{D}_t is fed back to update the background model. Comprehensive experiments evaluated on the CDnet 2014 dataset demonstrate that our proposed method achieves state-of-the-art performance among all unsupervised background subtraction methods while operating at real-time, and even performs better than some deep learning based supervised algorithms. In addition, our proposed framework is very flexible and has the potential for generalization

    Ambient point clouds for view interpolation

    Get PDF

    Scene Reconstruction and Visualization From Community Photo Collections

    Full text link

    Neural 3D Video Synthesis

    Full text link
    We propose a novel approach for 3D video synthesis that is able to represent multi-view video recordings of a dynamic real-world scene in a compact, yet expressive representation that enables high-quality view synthesis and motion interpolation. Our approach takes the high quality and compactness of static neural radiance fields in a new direction: to a model-free, dynamic setting. At the core of our approach is a novel time-conditioned neural radiance fields that represents scene dynamics using a set of compact latent codes. To exploit the fact that changes between adjacent frames of a video are typically small and locally consistent, we propose two novel strategies for efficient training of our neural network: 1) An efficient hierarchical training scheme, and 2) an importance sampling strategy that selects the next rays for training based on the temporal variation of the input videos. In combination, these two strategies significantly boost the training speed, lead to fast convergence of the training process, and enable high quality results. Our learned representation is highly compact and able to represent a 10 second 30 FPS multi-view video recording by 18 cameras with a model size of just 28MB. We demonstrate that our method can render high-fidelity wide-angle novel views at over 1K resolution, even for highly complex and dynamic scenes. We perform an extensive qualitative and quantitative evaluation that shows that our approach outperforms the current state of the art. We include additional video and information at: https://neural-3d-video.github.io/Comment: Project website: https://neural-3d-video.github.io

    Optical Filtering for Near Field Photometry with High Order Basis

    Get PDF
    Accurately capturing the near field emission of complex luminaires is still very difficult. In this paper, we describe a new acquisition pipeline of such luminaires that performs an orthogonal projection on a given basis in a two-step procedure. First, we use an optical low-pass filter that corresponds to the reconstruction basis to guarantee high precision measurements. The second step is a numerical process on the acquired data that finalizes the projection. Based on this concept, we introduce new experimental setups for automatic acquisition and perform a detailed error analysis of the acquisition process

    LR-CNN : Local-aware Region CNN for vehicle detection in aerial imagery

    Get PDF
    State-of-the-art object detection approaches such as Fast/Faster R-CNN, SSD, or YOLO have difficulties detecting dense, small targets with arbitrary orientation in large aerial images. The main reason is that using interpolation to align RoI features can result in a lack of accuracy or even loss of location information. We present the Local-aware Region Convolutional Neural Network (LR-CNN), a novel two-stage approach for vehicle detection in aerial imagery. We enhance translation invariance to detect dense vehicles and address the boundary quantization issue amongst dense vehicles by aggregating the high-precision RoIs' features. Moreover, we resample high-level semantic pooled features, making them regain location information from the features of a shallower convolutional block. This strengthens the local feature invariance for the resampled features and enables detecting vehicles in an arbitrary orientation. The local feature invariance enhances the learning ability of the focal loss function, and the focal loss further helps to focus on the hard examples. Taken together, our method better addresses the challenges of aerial imagery. We evaluate our approach on several challenging datasets (VEDAI, DOTA), demonstrating a significant improvement over state-of-the-art methods. We demonstrate the good generalization ability of our approach on the DLR 3K dataset. © 2020 Copernicus GmbH. All rights reserved
    • …
    corecore